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Abstract 

We give an intuitive though general explanation of the finite- 
size effect in scale-free networks in terms of the degree dis- 
tribution of the starting network. This result clarifies the rele- 
vance of the starting network in the final degree distribution. 
We use two different approaches: the deterministic mean- 
field approximation used by Barabasi and Albert (but taking 
into account the nodes of the starting network), and the prob- 
ability distribution of the degree of each node, which consid- 
ers the stochastic process. Numerical simulations show that 
the accuracy of the predictions of the mean-field approxima- 
tion depend on the contribution of the dispersion in the final 
distribution. The results in terms of the probability distribu- 
tion of the degree of each node are very accurate when com- 
pared to numerical simulations. The analysis of the standard 
deviation of the degree distribution allows us to assess the 
influence of the starting core when fitting the model to real 
data. 

Introduction 

Power-laws are not a new issue in scientific literature. The 
emergence of the scale-free behavior in the degree distri- 
bution of the sizes of biological genera, incomes, words in 
a text, scientific citation, etc., has been widely studied (and 
several times re-invented) in the past century (see [1] for an 
interesting review of this re-inventions). One of the most fa- 
mous works in this subject nowadays is the one by Barabasi, 
Albert and Jeong (BA) [2], in which they introduce the "pref- 
erential attachment" model in social networks such as the 
world wide web |3| and the network of movie actors [4]. In 
this model, nodes would get new links from new nodes in the 
network with probability proportional to their degree. This 
way, nodes with high degree would be more likely to receive 
new links, a rich gets richer mechanism that would render 
a power-law distribution of the network degree. This model 
has been widely used by other researchers (see, for exam- 
ple, [5] for an exhaustive review of other models constructed 
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thereafter, and references therein). 

However, some investigations showed that the model of 
preferential attachment of BA would depart from the pre- 
dicted power-law behavior in small networks [6] [7] [8] [9]. ^ n 
influence of the initial nodes from which the network starts 
growing was acknowledged, but no general prediction of the 
effect of these nodes on the degree distribution was made, 
except for particular cases (U 0- Although the attempts 
of studying this effect have been both numerical and theo- 
retical, up to our knowledge there are no intuitive, general 
explanations of the process, nor predictions of the final de- 
gree distribution in these networks in the scientific literature 
yet. 

In this letter we find a general, theoretical prediction of 
the final degree distribution of finite networks growing with 
preferential attachment in terms of the degree distribution of 
the starting network. We obtain an expression of the final 
distribution using two different approaches: the well-known, 
deterministic mean-field approximation (with the contribution 
of the nodes of the starting network), and the expected prob- 
ability distribution of the degree of each node, which con- 
siders the stochastic process. The methods used are very 
simple and intuitive, and the numerical simulations support 
very well the theoretical results. One of our main findings is 
the relevance of the starting nodes of the network in the final 
degree distribution, which must be considered when fitting 
the model to real data. 

Model definition 

The model on which we are going to focus is the original one 
introduced by BA [2]. In this model, at every time step, a 
new node arrives to the network and attaches to other nodes 
by m undirected new links, the probability of any node in the 
network of gaining one of these new links being proportional 
to its degree. Notice that, in order for the process to be well 
defined, a starting network (or core) to which the first new 
node may link is needed. We will not allow multiple linking 
between two nodes, thus the size of the starting core must 
be of m nodes, at least. We do not consider nodes with null 
degree since, in this model, these nodes would never get any 
links. 
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We define f as the number of nodes at each time step 
and f as the number of nodes in the starting core, with f > 
m. If a node has degree k at time f, then the probability 
of this node gaining a new link when node f + 1 arrives is, 
according to the model of BA [2], n ktt = Ak, where A is a 
normalizing constant that must satisfy the expected number 
of new links in the network to be m, J2)=i n k,,t = m - This 
condition renders 

mk 

If the mean number of links in the starting core is m , then 
the total degree of the network at time f is J^ /=1 k, = 2[(m — 
m)t + mt] = 2m([j,t + f), with y, = (m /m — 1). Notice 
that n k it = /c/(2(/xf + f)) is such that the dynamics of every 
node is independent from the rest of nodes in the network. 
Therefore, whatever the approach to simulate the dynamics 
of each node, we can use this result to calculate the degree 
distribution of the network for each t > fo, knowing the initial 
distribution at t . 

Mean-field approximation 

We start with the mean-field approach followed by BA (2], 
which consists in a continuum approximation in both degree 
and time in such a way that the rate of change of the degree 
of any node, k, equals its expected value, 7i> >f : 



dk 
~dt 



(2) 



2( M f + t) 

Integration of eq. (2) renders the deterministic degree at time 
f of a node that has degree n at time r, k(t) = h(t; k, t), with 



h{t; k,t) 



(J-h + t 



(3) 



and t,r > t . From expression (3) it follows that, for fixed 
f and r, k(t) strictly increases with k, and for fixed f and n, 
k(t) strictly decreases with r. Usually, n = m is taken for 
all nodes, and the usual asymptotic, power-law behavior is 
obtained [2]. However, notice that this initial condition is only 
valid for the nodes added to the network. When we consider 
the case of the added nodes (r > f ) separately from the 
case of the nodes of the starting core (r = f ), the finite-size 
effect emerges. 

Let F m (k, t) be the complementary, cumulative distribution 
of the degree of the network at time t under the mean-field 
approximation. Thus F m (k, t) gives the portion of nodes at 
time t with degree greater than or equal to k for f > f . Let 
F (k) be the corresponding degree distribution of the starting 
core at time f , thus F m (k, f ) = F (k). We define k m (t) as the 
degree at time f of the nodes that had degree m at time f , 
i.e., k m (t) = h(t; m, f ). Therefore, all the added nodes of 
the network must have degree smaller than k m (t) at time f. 
Similarly, we define k (k, t) as the degree that should have 
a node at time f in order to have degree k at time f; from 
eq. |3) it follows that k {k, t) = h(t ; k, t). 



At time f, nodes with degree k > k m (t) cannot come from 
the added nodes, but from the nodes that in the starting 
core had degree /c (/c, f), instead. Thus the portion of nodes 
with degree greater than or equal to k, with k > k m (t), is 
ih/t) F (ko(k, t)). For m < k < k m (t), the portion of nodes 
with degree greater than or equal to k coming from the start- 
ing core are (f /f) F (/c (/(, f)), but also added nodes until 
time t*, with t* such that h{t\m,T*) = k, must be consid- 
ered, rendering (r* - t )/t. Finally, for k < m all added 
nodes have degree greater than or equal to k, and the con- 
tribution from the starting core is similar to the other ranges. 
Therefore, the final degree distribution at time f is 



F m (k, t) 



1 - 'j [1 - F (ko(k, t))] , k<m, 

04 + i)(x) 2 -/4 

-? [1 - F (ko(k,t))], rn<k<k m (t), 

fFo(MM)), k>k m (t)- 



(4) 

The finite-size effect that makes expression j4| to depart 
from the classical power-law comes from the starting core 
in terms of (f /f)F (/c (/c, f)) and from the finite number of 
added nodes, which yields the emergence of k m (t). Notice 
that these contributions vanish in the limit f — > oo, where the 
usual asymptotic result is recovered. 

Previous works of the finite-size effect [U[7]|8| studied the 
ratio between the actual final degree density distribution of 
the finite network and the asymptotic power-law of BA and 
found that, for networks growing from the same starting core, 
this ratio is 

, = w{k/Vt), (5) 
f(k, t — > oo) 

where w(x) is the cut-off function, and depends on the start- 
ing core used. From expression j4j we can calculate such a 
ratio within the mean-field approximation, rendering 



Uk, t) 
f m {k, t -> oo) 



yt + 1 
t 



u(k, t) 



(6) 



where f m (k, t) = —dF m (k, t)/dk is the density distribution in 
the mean-field approximation, f m (k, t -> oo) = 2m 2 k~ z is 
the asymptotic density obtained with the BA methodology, 



u(k, t) 



1 +g(ko(k,t)), m<k<k m (t), 



9(ko(k,t)), 



k > k m (t), 



(7) 



9(k) = 2ii^)k 3 Mk) and f (k) = -6F {k)/6k is the de- 
gree density distribution of the starting core. Noticing that 
ko(k, t) = kyj{^,t + ?o)/(Aifo + t), expression |6) resembles 
expression (5} when ^f /f ~ 0. 

Expressions {4} and (6) are obtained within the well- 
known mean-field approximation as a consequence of con- 
sidering the starting core degree distribution in the calcu- 
lation of F m (k, t). However, it is strictly deterministic: the 
degree distribution of the starting f nodes at time t is the 
same initial distribution at time f , stretched by a factor 
\Z(fj-t + t )/(fit + f); no effect of the dispersion of the de- 
gree of nodes as a consequence of the stochastic process 
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is considered at all. In the next section we will consider the 
full stochastic process when calculating the final degree dis- 
tribution. 



Probability distribution of the degree of a node 

Expression j4j is the result of a counting procedure includ- 
ing the starting nodes within the deterministic mean-field ap- 
proximation. However, a similar procedure with no further 
approximations can be followed in order to obtain the ex- 
pected complementary, cumulative degree distribution of the 
network, F e (k, t). With mean-field, the stochastic dynamics 
of every node was approximated by the deterministic eq. (2); 
but the real stochastic process can be described by the prob- 
ability distribution P(/c, f| n, r), which represents the probabil- 
ity of a node having degree k at time f if it had degree k at 
time r, f > t. The probability of gaining a new link at each 
time step, ir k]h relates to the probabilities P(k, t\n,r) and 
P(k, t — 1 \k, t) in the recurrence relation 

P(k,t\K,T)=P(k,t-\\K,T)(1 -tt^) 

+ P(/c-1,f-1|K,r)7r /( _ 1 , f _ 1 . 

The number of links that a node may receive is con- 
strained by the number of added nodes, k - n < t — r, 
and the degree of any node cannot decrease, so k > k 
for every t > r; thus P(k, t\n, r) = for all the sets (k, t) 
and (k, t) that do not fulfill these conditions. Therefore, 
P(k, t\n, t) can be numerically calculated from eq. (8} using 
P(k,t\k, t) = 1 as the initial condition. Defining the proba- 
bility distribution f e (k, t) as the expected probability of finding 
a node in the network with degree k at time f, then the ex- 
pression of f e (k, t) is 

k t ' 1 

uk, t) = Yl j f oM p ( k > t\ K , t Q ) + 7 p (^ f i m ' r )' o) 

K=0 T=f +1 

where f (ft) stands for the probability distribution of the de- 
gree of the starting core at time f , and thus f e (k, t ) = f (k). 
The first term in expression (9} comes from the nodes in 
the starting core: jfo{n) stands for the expected fraction of 
nodes at time f that had degree k in the starting core, and 
P(k, t\n, t ) is the probability of these nodes of having de- 
gree k at time f; the result is summed for all possible degrees 
that may affect f e (k, t). The second term refers to the added 
nodes: from the node added at time t + 1 to the last node 
added at time f, the probability of these nodes having degree 
k at time t is summed and rescaled to the actual number of 
nodes, t. The cumulative distribution F e (k, t) can be calcu- 
lated from |9) using the expression 

F e (kJ) = Y,feUJ)- (10) 




Figure 1 : Complementary cumulative distributions versus 
degree in log-log scale for a network of f = 1000 nodes, 
o: averaged distribution of 1000 synthetically generated net- 
works, (F syn (k, t)); *: final distribution of one of the synthet- 
ical networks, F syn (k, f); red, solid line: theoretical, expected 
distribution F e (k, t); blue, dashed line: theoretical mean-field 
approximation F m (k, t). The networks grow from an starting 
core with: (a) only two linked nodes (to = 2 nodes with de- 
gree distributed according to S^ k ), with m = 1; (b) t = 10 
nodes, with every node connected to the others (and there- 
fore with a degree distributed according to d 9ik ) and m = 10 
(inset: the same, for m = 3). 
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Numerical results 

In order to check the results of expressions (4), {9} and {To} 
we need to simulate the stochastic process defined by ex- 
pression {JJ. However, for m > 1, this process does not 
cover the space of probabilities of the whole system, i.e., the 
sum of the probabilities for all possible choices of the nodes 
that get a new link is not normalized to unity, and thus condi- 
tion 

f 

- J2 ^ ■ ■ ■ ^ = 1 (1 1 ) 

'1=1 k<h '/n<'m-1 

is not fulfilled for m > 1 (though it is for m = 1). Therefore, 
for m > 1 , there is no such stochastic process. 

Nonetheless, we can consider the following stochastic 
linking process when a new node arrives: the m nodes 
are chosen sequentially, each one with probability k/Y^jK, 
where the sum does not contain previously chosen nodes, 
and avoiding repetition. Expression {JJ describes exactly this 
stochastic process for m = 1 , and it is a good approximation 
for f ^> m. For f > m, there is an exclusion effect that 
makes the probability highly non-linear with respect to the 
degree (the case f = m, where all nodes should get a new 
link with probability equal to 1, shows the inaccuracy), and 
the model does not describe well the growing process in this 
regime. However, as the network grows, the model will even- 
tually capture the stochastic dynamics of the nodes. As a 
result, for f > m and t » m, the dynamics of young nodes 
are well approximated by expression {JJ, but the dynamics 
of old nodes may depart from that. These effects render a 
slight error in the prediction of the final distribution F(k, t) for 
large values of k. 

Figures [JJand[2]show the agreement for different starting 
cores between the mean-field approximation F m (k, f), the ex- 
pected F e (k, t) and synthetically generated complementary 
cumulative distributions, F syn (k, t). The growing process of 
fig.[J^ has m = 1, and therefore the result given by F e (k, t) 
is exact in this case. However, it is also the worst example 
for the mean-field approximation, F m (k, f), since the tail of 
the distribution comes from the dispersion of the degree of 
the starting nodes in the network; this is clear when com- 
paring the tail of the distribution of the averaged simulations, 
(F sy n(k, t)) , with one of them, F syn (k, t). In fig.[T[) there is less 
dispersion, and the mean-field approximation works better. 
The expected distribution F e (k, t) departs slightly in the tail 
from (F syn (k, t)) since the starting number of nodes is f = 1 
and the number of links per new node is m = 10, but the re- 
sult is not so bad since, as we add nodes to the network, the 
model improves its accuracy. 

In fig.[2]we see the effect in the final distribution of a poorly 
connected (fig. [2^) and a highly connected (fig. [2})) starting 
core compared to the number of links per new node, m. The 
heap that emerges in the tail of the latter comes from the de- 
gree of the starting nodes, higher than the initial degree of 
the added nodes. In the mean-field approximation, the posi- 
tion of the heap depends on k m (t) and F (ko(k, t)). For poorly 




Figure 2: Complementary cumulative distributions versus 
degree in log-log scale for a network constructed from f = 
100 until t = 1000 nodes, o: distribution of a synthetically 
generated network, F syn (k, f); red, solid line: theoretical, ex- 
pected distribution F e (k, f); blue, dashed line: theoretical 
mean-field approximation F m (k, f), from an starting core with: 
(a) degree distributed according to a <5 2 o,fr probability distribu- 
tion and m = 10 new links per added node (inset: the same, 
but with a starting core with degree distributed according to 
8 5:k and m = 20); (b) degree of the starting core distributed 
according to a uniform distribution between degree 10 and 
20 and m = 5 (inset: the same, for a uniform distribution 
between and 20). 
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Figure 3: Synthetically generated complementary cumula- 
tive distributions versus degree for a network of t = 1000 
nodes with a starting core of to = 10 nodes with initial distri- 
butions f (k) = 5-[,k (o) and f (k) = S 9ik (V). The distributions 
have been averaged over 1 000 simulations, and the dashed 
lines correspond to the mean ± one standard deviation. The 
inset shows the final degree distributions when the network 
grows up to t = 10 4 nodes. 



connected cores, k m (t) is very high compared to the degrees 
where the starting nodes contribute to F m (k, t) (notice the 
small perturbation of F m (k, t) around k = 50 or 60 in fig. [2^), 
and the tail of the network comes from the added nodes at 
every time step. On the contrary, for highly connected cores, 
the contribution of the starting nodes overtakes k m (t) and we 
see clearly the heap that they form. This analysis explains 
previous numerical results of the finite-size effect [9]. 

Figure[2^ is also an example the where mean-field gives a 
good approximation of the final distribution. This agreement 
comes from the low dispersion of the starting nodes in this 
case. 

In figure |3]the averaged final distributions and their stan- 
dard deviations are represented for two different initial con- 
figurations and two different final network sizes. This plot 
shows the relevance of the differences that may arise in the 
final distribution depending on the starting core considered. 
The main differences arise for F < f /f, where the two av- 
eraged distributions behave completely differently and there 
is a region where the distributions do not overlap within the 
margin of each other's standard deviation. For F > f /f, 
the initial nodes make the slopes to be slightly different, too. 
As the inset shows, these effects remain even when the final 
network is ten times larger than the network of the main plot. 
Thus the finite-size effect is not embedded within the errors 
of the final distribution as the network grows. Such signifi- 



cantly different final distributions should give rise to different 
fitting results for the same empirical data. 



Conclusions 

In this letter we have shown the basic mechanisms that lead 
to the finite-size effect in growing networks with preferen- 
tial attachment, namely the nodes of the starting core from 
which the network starts growing and the dispersion in the 
degree of such nodes during the stochastic process. We 
have developed a general formalism that can be used with 
any deterministic or probabilistic approach of the dynamics 
of the degree of a node. In particular, we have shown the re- 
sults with the mean-field approximation (2} and with the prob- 
ability distribution Pk,t( K > T ) defined in expression (8). This 
methodology can be applied for any other approach to the 
dynamics of the degree of a node, though, as long as the 
degree distribution of the starting core of nodes from which 
the network starts its growing process, F (k), is taken into 
account when calculating the distribution F(k, t) of the final 
network. 

We have shown that our results give an intuitive explana- 
tion of the apparently universal behavior of the cut-off func- 
tion of the distribution observed in networks with different 
sizes when growing from the same starting core |6 7 8| (see 
expressions (5} and (6)). We have also explained previous 
results that studied the emergence of a heap in the final dis- 
tribution [9] in terms of the connectivity of the starting core 
compared to the number of links per new node, m, like the 
ones showed in fig. [2] 

We have also shown that in the original model of BA [2] 
for m > 1 the attachment process cannot be described with 
a probability of attachment to a node independent of that of 
other nodes. However, the probability given by expression 
(T} is a good approximation of the stochastic dynamics used 
in the simulations for f ^> m when m > 1. This must be 
considered when comparing the theoretical results obtained 
in this letter with numerical simulations, since the model of 
BA [2] (and, therefore, our theoretical results) are only accu- 
rate in this regime. For regimes with f > m, the dynamics 
of the nodes would not be well described by expression {T| 
although, as we add nodes to the network, the approxima- 
tion improves. This effect renders the differences observed 
in the tail of the distributions between the theoretical and the 
synthetical results in figs. [T]and[2](specially in fig.[lj>). 

The methodology followed in this letter can also be applied 
to other growing network models where the dynamics of the 
degree of a single node can be well approximated by its own 
state, regardless of the state of other nodes or the degree 
distribution of the network. These models can lead in the 
asymptotic limit t — s- oo to power-law-tailed distributions (see 
|5|) that are fitted against real data which is supposed to be 
modelled by this kind of growing mechanisms. However, the 
shape of the final distributions of these models may depend 
strongly on the initial configuration used, even for large net- 
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works, as shown in figure [3] where the final network is 10 2 
and 10 3 times larger than the starting core. Clearly, the fit- 
ting results may differ significantly depending on the size and 
distributions of the intial cores of the models, and therefore 
the results presented in this letter should be considered. 

Nowadays, dynamic networks are the hot topic in net- 
work investigation. Reaction kinetics on metabolic networks, 
spread of information or viruses in social networks. . . are 
examples of dominant issues in the latest scientific litera- 
ture. But, quoting A.-L. Barabasi, "to make progress in this 
direction, we need to tackle the next frontier, which is to un- 
derstand the dynamics of the processes that take place on 
networks" [10]. We hope that this work may help in that un- 
derstanding. 
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